Final Project Written Report

Author

Alex Arrieta, Rayan Gendre, Alvaro Ramos, Jackson Isidor

Code
library(tidyverse)
library(here)
library(knitr)
library(gganimate)
library(broom)

# loading in our data sets
gdp <- read_csv(here("data", "gdp_pcap.csv"))
wateraccess <- read_csv(here("data", "at_least_basic_water_source_overall_access_percent.csv"))

1 Project Proposal

For this project, we are interested in assessing the relationship between a country’s GDP and the percentage of people within that country that have access to a basic water source. To do accomplish this, we will be using two datasets from Gapminder that correspond to our variables of interest.

1.1 The Data

Our two variables of interest (and their related datasets) are:

At least a basic water source, overall access % [Response]

This variable’s dataset displays the percent of people who are using at least basic water services by Country and by Year from 2000 to 2022. The percentage encompasses both the people who have access to basic water services as well as those who have access to safe water services. The dataset defines a basic water service as water from an improved source and collection is not more than 30 minutes for a round trip.

GDP per capita (Price and inflation adjusted) [Explanatory]

This variable’s dataset displays the Gross domestic product per person adjusted for differences in purchasing power, where each observation displays information for a Country in a given Year beginning in 1800 and going to present day. The values are displayed in international dollars and are fixed at 2017 prices.

1.2 Hypothesized Relationship

We hypothesize that there will be a positive relationship between the percentage of the country’s population that has access to water and the GDP per capita. This means that as the percentage of the population has access to water increases, then the GDP per capita of the country will also increase. Our basis for this is that countries with more prevalent global economies such as the United States and European countries have better access to water and safer modern ways for their citizens to access water. And when people have better access to water they can spend more time developing other areas of the country and business which leads to a more powerful economy and a better GDP per capita.

1.3 Data Cleaning

Because each variable has their own dataset, we will have to clean and combine the two datasets together to assess the relationship between the two variables. Starting with the GDP per capita dataset, some values have the letter k embedded alongside the numeric value, preventing us to pivot the dataset into a longer format. To get around this, we forced every column in our dataset to become character variables so every column is the same data type. Once our data has been pivoted, we could then address the issue of embedded k’s by extracting the numbers for values that did contain a k, and then multiplying these values by 1000 while also converting these values to a numeric data type. Afterwards, we converted the remaining variables (country and year) to an appropriate data type.

The process for cleaning the water access dataset was much easier than for the GDP per capita dataset, as we did not have to deal with the issue of embedded k’s. Therefore, all we had to do for this dataset was pivot and then convert the remaining columns/variables to an appropriate data type. Additionally, we dropped any observtions that contained any missing or NA values.

Once each dataset had been pivoted and cleaned, we could then join our two datasets through an inner join by country and by year. To help with visualization, we also created a new variable called region which indicates the world region that a country resides in.

Code
# Pivoting and cleaning the gdp dataset
gdp_clean <- gdp |>
  mutate(across(.cols = everything(),
                .f = ~ as.character(.x)
                )
         ) |>
  pivot_longer(cols = `1800`:`2100`, 
               names_to = "year", 
               values_to = "gdp_pc"
               ) |>
  mutate(gdp_pc = if_else(condition = str_detect(gdp_pc, "k$"),
                          true = as.numeric(str_extract(gdp_pc, "[0-9|.]*")) * 1000,
                          false = as.numeric(gdp_pc)
                          ),
         country = as.factor(country),
         year = as.numeric(year)
         )

# Pivoting and cleaning the water_access dataset
wateraccess_clean <- wateraccess |>
  pivot_longer(cols = `2000`:`2022`,
               names_to = "year",
               values_to = "access_pct",
               values_drop_na = TRUE
               ) |>
  mutate(country = as.factor(country),
         year = as.numeric(year),
         access_pct = as.numeric(access_pct)
         )

# Joining the two datasets
gdp_v_water <- gdp_clean |> 
  inner_join(wateraccess_clean,
            by = join_by(country, year)
            ) |> 
  mutate(region = fct_collapse(country,
                               "Asia" = c("Afghanistan",
                                          "Australia",
                                          "Bahrain",
                                          "Bangladesh",
                                          "Bhutan",
                                          "Brunei",
                                          "Cambodia",
                                          "China",
                                          "Hong Kong, China",
                                          "Fiji",
                                          "India",
                                          "Indonesia",
                                          "Iran",
                                          "Iraq",
                                          "Israel",
                                          "Japan",
                                          "Jordan",
                                          "Kazakhstan",
                                          "Kiribati",
                                          "Kuwait",
                                          "Kyrgyz Republic",
                                          "Lao",
                                          "Lebanon",
                                          "Malaysia",
                                          "Maldives",
                                          "Marshall Islands",
                                          "Micronesia, Fed. Sts.",
                                          "Mongolia",
                                          "Myanmar",
                                          "Nauru",
                                          "Nepal",
                                          "New Zealand",
                                          "North Korea",
                                          "Oman",
                                          "Pakistan",
                                          "Palestine",
                                          "Palau",
                                          "Papua New Guinea",
                                          "Philippines",
                                          "Qatar",
                                          "Samoa",
                                          "Saudi Arabia",
                                          "Singapore",
                                          "Solomon Islands",
                                          "South Korea",
                                          "Sri Lanka",
                                          "Syria",
                                          "Tajikistan",
                                          "Thailand",
                                          "Timor-Leste",
                                          "Tonga",
                                          "Turkmenistan",
                                          "Tuvalu",
                                          "UAE",
                                          "Uzbekistan",
                                          "Vanuatu",
                                          "Vietnam",
                                          "Yemen"),
                               "Africa" = c("Algeria",
                                          "Angola",
                                          "Benin",
                                          "Botswana",
                                          "Burkina Faso",
                                          "Burundi",
                                          "Cameroon",
                                          "Cape Verde",
                                          "Central African Republic",
                                          "Chad",
                                          "Comoros",
                                          "Congo, Dem. Rep.",
                                          "Congo, Rep.",
                                          "Cote d'Ivoire",
                                          "Djibouti",
                                          "Egypt",
                                          "Equatorial Guinea",
                                          "Eritrea",
                                          "Eswatini",
                                          "Ethiopia",
                                          "Gabon",
                                          "Gambia",
                                          "Ghana",
                                          "Guinea",
                                          "Guinea-Bissau",
                                          "Kenya",
                                          "Lesotho",
                                          "Liberia",
                                          "Libya",
                                          "Madagascar",
                                          "Malawi",
                                          "Mali",
                                          "Mauritania",
                                          "Mauritius",
                                          "Morocco",
                                          "Mozambique",
                                          "Namibia",
                                          "Niger",
                                          "Nigeria",
                                          "Palau",
                                          "Rwanda",
                                          "Sao Tome and Principe",
                                          "Senegal",
                                          "Seychelles",
                                          "Sierra Leone",
                                          "Somalia",
                                          "South Africa",
                                          "South Sudan",
                                          "Sudan",
                                          "Tanzania",
                                          "Togo",
                                          "Tunisia",
                                          "Uganda",
                                          "Zambia",
                                          "Zimbabwe"),
                               "Americas" = c("Antigua and Barbuda",
                                            "Argentina",
                                            "Bahamas",
                                            "Barbados",
                                            "Belize",
                                            "Bolivia",
                                            "Brazil",
                                            "Canada",
                                            "Chile",
                                            "Colombia",
                                            "Costa Rica",
                                            "Cuba",
                                            "Dominica",
                                            "Dominican Republic",
                                            "Ecuador",
                                            "El Salvador",
                                            "Grenada",
                                            "Guatemala",
                                            "Guyana",
                                            "Haiti",
                                            "Honduras",
                                            "Jamaica",
                                            "Mexico",
                                            "Nicaragua",
                                            "Panama",
                                            "Paraguay",
                                            "Peru",
                                            "St. Kitts and Nevis",
                                            "St. Lucia",
                                            "St. Vincent and the Grenadines",
                                            "Suriname",
                                            "Trinidad and Tobago",
                                            "USA",
                                            "Uruguay",
                                            "Venezuela"),
                               "Europe" = c("Albania",
                                            "Andorra",
                                            "Armenia",
                                            "Austria",
                                            "Azerbaijan",
                                            "Belarus",
                                            "Belgium",
                                            "Bosnia and Herzegovina",
                                            "Bulgaria",
                                            "Croatia",
                                            "Cyprus",
                                            "Czech Republic",
                                            "Denmark",
                                            "Estonia",
                                            "Finland",
                                            "France",
                                            "Georgia",
                                            "Germany",
                                            "Greece",
                                            "Hungary",
                                            "Iceland",
                                            "Ireland",
                                            "Italy",
                                            "Latvia",
                                            "Lithuania",
                                            "Luxembourg",
                                            "Malta",
                                            "Moldova",
                                            "Monaco",
                                            "Montenegro",
                                            "Netherlands",
                                            "North Macedonia",
                                            "Norway",
                                            "Poland",
                                            "Portugal",
                                            "Romania",
                                            "Russia",
                                            "San Marino",
                                            "Serbia",
                                            "Slovak Republic",
                                            "Slovenia",
                                            "Spain",
                                            "Sweden",
                                            "Switzerland",
                                            "Turkey",
                                            "UK",
                                            "Ukraine"),
                               other_level = "other"
                               ),
         .after = country
         )

kable(head(gdp_v_water), format = "markdown")
country region year gdp_pc access_pct
Afghanistan Asia 2000 794 27.4
Afghanistan Asia 2001 775 27.5
Afghanistan Asia 2002 1260 29.7
Afghanistan Asia 2003 1280 31.9
Afghanistan Asia 2004 1260 34.1
Afghanistan Asia 2005 1350 36.3

2 Assessing the Linear Relationship

Now that we have our combined dataset, we can now assess the relationship between a country’s GDP per capita and the proportion of their population that has access to water. Before formally analyzing this relationship through a fitted linear regression model, we would like to look at the relationship between the two variables over time.

2.1 Data Visualization

Code
gdp_v_water |> 
  ggplot(aes(x = gdp_pc,
             y = access_pct,
             fill = region,
             color = region,
             shape = region)
         ) +
  geom_point() +
  theme_bw() +
  scale_color_manual(values = c("#e66101",
                                "#fdb863",
                                "#b2abd2",
                                "#5e3c99")
                     )+
  scale_fill_manual(values = c("#e66101",
                                "#fdb863",
                                "#b2abd2",
                                "#5e3c99")
                     ) +
  scale_shape_manual(values = 21:24) +
  labs(x = "GDP Per Capita",
       y = "",
       subtitle = "% of Pop. with Basic Water Access",
       title = "Relationship between Water Access and GDP",
       tag = "Year: {frame_time}",
       fill = "World Region",
       color = "World Region",
       shape = "World Region"
       ) +
  transition_time(as.integer(year)) +
  ease_aes('linear')

When assessing the relationship between Water Access vs. GDP over time through an animated plot, we can see that there generally seems to be a strong, positive non-linear relationship between the two variables; while countries with larger GDP per capita tend to have higher rates of water access, this trend only seems to apply when looking at countries that have a GDP per Capita of about $50,000 or less. Countries with a GDP per capita of $50,000 or greater will more often than not have almost 100% of their population have access to a basic water source. Additionally, as time goes on, we see that there seems to be an improvement overall in the percentage of the population that have water access for all countries, with most countries having been able to increase this percentage over time. One other notable observation that can be made from this animated plot is that the world region with seemingly the highest percentage of water access on average is Europe, while the region with the lowest percentage on average seems to be Africa (which makes sense as European countries seem to have the highest GDP per capita while African countries seem to have the lowest).

Because we can see that the relationship between GDP per capita and Water Access is far from being linear, we would like to perform a logit transformation on the average proportion of a country’s population that has basic water access (our response variable) before fitting a linear model to help with this issue of non-linearity using the following formula:

ln(average water access / (101 - average water access))

A plot containing a fitted linear model alongside our transformed response variable follows.

Code
gdp_v_water_mean <- gdp_v_water |> 
  group_by(country, region) |> 
  summarize(mean_gdp = mean(gdp_pc),
            mean_access = mean(access_pct)
            ) |> 
  mutate(logit_meanaccess = log((mean_access)/(101 - mean_access)))

gdp_v_water_mean |> 
  ggplot(aes(x = mean_gdp,
             y = logit_meanaccess
             )
         ) +
  geom_point() +
  geom_smooth(method = "lm",
              aes(x = mean_gdp,
                  y = logit_meanaccess,
                  group = 1),
              color = "red"
              ) +
  theme_bw() +
  labs(x = "GDP Per Capita (averaged over the years)",
       y = "",
       subtitle = "Logit of the Pop. Percentage with Basic Water Access (averaged over the years)",
       title = "Relationship between LOGIT avg. Water Access and avg. GDP"
       ) +
  scale_y_continuous(limits = c(0, 6))

From this plot, we can see that even with a transformation, there still seems to be a somewhat weak, positive linear relationship between the average GDP per capita of a country and the logit of the population percentage with access to a basic water source, though this relationship is much improved in terms of linearity over the relationship with the original non transformed response. Additionally, we can also see that there may be some data points in which some people may consider outliers. Most notably, there is one country that has a relatively high GDP per capita compared to the other countries. By rearranging our combined dataset by the mean GDP per capita in descending order and grabbing the first observation, we can determine that this country with the highest average GDP per capita is Monaco, a country in Europe with an average GDP per capita of $186,521.74 and a logit value of 4.6052.

Code
gdp_v_water_mean |> 
  ungroup() |> 
  slice_max(order_by = mean_gdp,
            n = 1)

2.2 Linear Regression

In addition to our plot containing a linear regression model, we can also create and obtain the coefficients for this model.

Code
model <- lm(logit_meanaccess ~ mean_gdp, data = gdp_v_water_mean)
model

Call:
lm(formula = logit_meanaccess ~ mean_gdp, data = gdp_v_water_mean)

Coefficients:
(Intercept)     mean_gdp  
  1.526e+00    4.629e-05  
Code
#summary(model)
#augment(model)
#anova(model)

Our estimated linear regression model (after performing a logit transformation on our response variable) is:

ŷ = predicted ln(mean water access/(101- mean water access)) = 1.516 + 0.0000463(mean GDP per capita).

Since the intercept for our model is 1.516, then the logit value for countries with an average GDP per capita of zero is 1.516 (corresponding to a population percentage value of 82.83, which was obtained by undoing the original logit transformation through an inverse). Similarly, because our model’s slope is 0.0000463, then when a country’s GDP per capita increases by one dollar, the logit value of the population percentage that has access to a basic water supply is expected to increase by 0.0000463.

2.3 Model Fit

(INCLUDE ASSESSMENT OF LINE CONDITIONS HERE)

Code
kable(
  tibble(LogitAccess_Variance = var(augment(model)$logit_meanaccess),
         Prediction_Variance = var(augment(model)$.fitted),
         Residual_Variance = var(augment(model)$.resid)
         )
)
LogitAccess_Variance Prediction_Variance Residual_Variance
2.454607 1.156303 1.298303

To assess our model’s fit, we can use variances for the response, fitted, and residual values. From here, we can calculate the proportion of the variability seen in our response variable that can be explained by our estimated linear regression model, which in this case is:

1.156/2.455 = 1-(1.300/2.455) = 0.471

Only about 47.1% of the variation in the response can be accounted for by our model, which indicates that we have a very poor model, and suggests that there is a very weak linear relationship between the GDP per capita of a country and the logit value of the percentage of the country’s population that has access to a basic water source. Therefore, a linear model may not be the best way to describe the relationship between these two variables, which confirms what we saw earlier when plotting the linear regression model.

3 Simulation

To better assess how our fitted linear model performs at explaining the total variability seen in the response variable, we can do some simulation!

3.1 Visualizating Simulations from the Model

To start off, we can generate simulated observations by using the predicted values from our linear regression model and add random errors to each one using the residual standard error from the same model.

Code
set.seed(7256)

# Creating function that adds random noise to simulate 'observed' values
addnoise <- function(x, mean = 0, sd) {
  new_x <- x + rnorm(length(x),
                     mean,
                     sd
                     )
  return(new_x)
}

# Extracting predicted values and standard deviation of residuals from fitted model
logit_predict <- predict(model)
resid_sigma <- sigma(model)

# Testing function and creating a set of simulated observations
sim_logit <- tibble(sim_logit_meanaccess = addnoise(x = logit_predict, sd = resid_sigma))
sim_logit
# A tibble: 194 × 1
   sim_logit_meanaccess
                  <dbl>
 1                3.03 
 2                2.53 
 3                0.406
 4                4.25 
 5                1.54 
 6                2.88 
 7                1.89 
 8                1.53 
 9                4.00 
10                1.61 
# ℹ 184 more rows
Code
# Adding newly generated observations to a dataset containing the original observed values
sim_data <- gdp_v_water_mean |> 
  ungroup() |> 
  filter(is.na(logit_meanaccess) == FALSE,
         is.na(mean_access) == FALSE
         ) |> 
  select(mean_gdp, logit_meanaccess) |> 
  bind_cols(sim_logit)

# Visualizing and comparing the simulation from our model to the original data
sim_data |> 
  ggplot(aes(x = mean_gdp,
             y = logit_meanaccess
             )
         ) +
  geom_point() +
  geom_smooth(method = "lm",
              aes(x = mean_gdp,
                  y = logit_meanaccess,
                  group = 1),
              color = "red"
              ) +
  theme_bw() +
  labs(x = "GDP Per Capita (averaged over the years)",
       y = "",
       subtitle = "Logit of the Pop. Percentage with Basic Water Access (averaged over the years)",
       title = "Relationship between Observed LOGIT avg. Water Access and avg. GDP"
       ) +
  scale_y_continuous(limits = c(0, 6))

Code
sim_data |> 
  ggplot(aes(x = mean_gdp,
             y = sim_logit_meanaccess
             )
         ) +
  geom_point() +
  geom_smooth(method = "lm",
              aes(x = mean_gdp,
                  y = logit_meanaccess,
                  group = 1),
              color = "red"
              ) +
  theme_bw() +
  labs(x = "GDP Per Capita (averaged over the years)",
       y = "",
       subtitle = "Simulated Logit of the Pop. Percentage with Basic Water Access (averaged over the years)",
       title = "Relationship between Simulated LOGIT avg. Water Access and avg. GDP"
       ) +
  scale_y_continuous(limits = c(0, 6))

Code
sim_data |> 
  ggplot(aes(x = sim_logit_meanaccess,
             y = logit_meanaccess),
         ) +
  geom_point() +
  scale_x_continuous(limits = c(0, 10)) +
  scale_y_continuous(limits = c(0, 10)) +
  geom_abline(slope = 1, intercept = 0, color = "red", linewidth = 1) + 
  theme_bw() +
  labs(x = "Simulated Logit of the Pop. Percentage with Basic Water Access (averaged over the years)",
       y = "",
       subtitle = "Observed Logit of the Pop. Percentage with Basic Water Access (averaged over the years)",
       title = "Relationship between Simulated LOGIT avg. Water Access and Observed LOGIT avg. Water Access"
       )

3.2 Generating Multiple Predictive Checks

Instead of creating just one dataset of simulated observations, we can generate 1000 simulated datasets. From here, we can regress the observed dataset against each one of the 1000 simulated datasets to generate multiple R^2 values, from where we can assess the distribution of the R^2 values to determine how well our original linear regression model describes the dataset (by determining how much variability seen in the observed logit water access is accounted for by our similated data based on the linear model).

Code
set.seed(7256)

# Generating 1000 simulated sets of observations and putting them into one dataset
nsims <- 1000
sim_data2 <- map_dfc(.x = 1:nsims,
                     .f = ~ tibble(sim = addnoise(logit_predict,
                                                  sd = resid_sigma
                                                  )
                                   )
                     )

# Cleaning the simulated dataset
colnames(sim_data2) <- colnames(sim_data2) |> 
  str_replace(pattern = "\\.\\.\\.",
              replace = "_"
              )

# Combining this simulated dataset with a dataset containing the original observed values
sim_data2 <- gdp_v_water_mean |> 
  ungroup() |> 
  filter(is.na(mean_gdp) == FALSE,
         is.na(logit_meanaccess) == FALSE
         ) |> 
  select(logit_meanaccess) |> 
  bind_cols(sim_data2)

head(sim_data)
# A tibble: 6 × 3
  mean_gdp logit_meanaccess sim_logit_meanaccess
     <dbl>            <dbl>                <dbl>
1    1688.           0.0770                3.03 
2   10617.           2.26                  2.53 
3   10891.           2.38                  0.406
4   57026.           4.61                  4.25 
5    6690.           0.0102                1.54 
6   19900            3.63                  2.88 
Code
# Regressing observed values to each set of simulated values to generate r-squared values
reg_sim <- map(.x = sim_data2,
               .f = ~ lm(logit_meanaccess ~ .x,
                         data = sim_data2)
               ) |> 
  map(.f = ~ glance(.x)) |> 
  map_dbl(.f = ~ .x$r.squared)
  
# Removing the r-squared value corresponding to logit_meanaccess ~ logitmeanaccess
reg_sim <- reg_sim[names(reg_sim) != "logit_meanaccess"]
reg_sim
     sim_1      sim_2      sim_3      sim_4      sim_5      sim_6      sim_7 
0.20914196 0.21148541 0.25847887 0.25228309 0.26960882 0.17772895 0.17757995 
     sim_8      sim_9     sim_10     sim_11     sim_12     sim_13     sim_14 
0.21894337 0.19532884 0.13547097 0.25003140 0.22454559 0.18874008 0.20815972 
    sim_15     sim_16     sim_17     sim_18     sim_19     sim_20     sim_21 
0.17002736 0.21154163 0.25035802 0.24931050 0.17389394 0.30385721 0.20886826 
    sim_22     sim_23     sim_24     sim_25     sim_26     sim_27     sim_28 
0.28991526 0.26896820 0.20109517 0.27030517 0.17367035 0.20962194 0.21279009 
    sim_29     sim_30     sim_31     sim_32     sim_33     sim_34     sim_35 
0.21490715 0.27338537 0.24350575 0.26032701 0.27104741 0.34144762 0.21017809 
    sim_36     sim_37     sim_38     sim_39     sim_40     sim_41     sim_42 
0.21664747 0.15213649 0.32810780 0.19737834 0.19050847 0.25647491 0.22450223 
    sim_43     sim_44     sim_45     sim_46     sim_47     sim_48     sim_49 
0.27299961 0.14824886 0.19854595 0.23568179 0.21555423 0.27690730 0.25458909 
    sim_50     sim_51     sim_52     sim_53     sim_54     sim_55     sim_56 
0.25006720 0.12201218 0.15487391 0.21437008 0.30883319 0.19921507 0.23440312 
    sim_57     sim_58     sim_59     sim_60     sim_61     sim_62     sim_63 
0.17031740 0.27545673 0.24193946 0.23383554 0.16169791 0.33044136 0.23285774 
    sim_64     sim_65     sim_66     sim_67     sim_68     sim_69     sim_70 
0.28654331 0.28840355 0.22428568 0.29760196 0.19003882 0.19452967 0.26087991 
    sim_71     sim_72     sim_73     sim_74     sim_75     sim_76     sim_77 
0.15527300 0.18205842 0.20899036 0.30198594 0.24463499 0.25147532 0.26776938 
    sim_78     sim_79     sim_80     sim_81     sim_82     sim_83     sim_84 
0.17385353 0.29640094 0.23350068 0.16686470 0.19611869 0.21006616 0.22712173 
    sim_85     sim_86     sim_87     sim_88     sim_89     sim_90     sim_91 
0.22747449 0.24432338 0.18933972 0.20576064 0.24514067 0.23442927 0.19190672 
    sim_92     sim_93     sim_94     sim_95     sim_96     sim_97     sim_98 
0.16070229 0.21801826 0.36018350 0.21864354 0.26860461 0.15831128 0.22369545 
    sim_99    sim_100    sim_101    sim_102    sim_103    sim_104    sim_105 
0.20036270 0.25745066 0.19497777 0.14809788 0.23170933 0.21038727 0.26293569 
   sim_106    sim_107    sim_108    sim_109    sim_110    sim_111    sim_112 
0.20451144 0.17995960 0.14804465 0.23002695 0.24173321 0.20336126 0.21497299 
   sim_113    sim_114    sim_115    sim_116    sim_117    sim_118    sim_119 
0.18983563 0.26762819 0.20087707 0.19080149 0.21603075 0.19708405 0.26295139 
   sim_120    sim_121    sim_122    sim_123    sim_124    sim_125    sim_126 
0.17303994 0.30004095 0.21240491 0.21738655 0.24440777 0.22619239 0.20958896 
   sim_127    sim_128    sim_129    sim_130    sim_131    sim_132    sim_133 
0.19273356 0.22130061 0.21364646 0.18668722 0.21965130 0.25853841 0.23037424 
   sim_134    sim_135    sim_136    sim_137    sim_138    sim_139    sim_140 
0.22213169 0.17204489 0.19248219 0.21809574 0.27696001 0.20558249 0.13198514 
   sim_141    sim_142    sim_143    sim_144    sim_145    sim_146    sim_147 
0.21904652 0.16326800 0.25310232 0.20891317 0.22268855 0.18677023 0.26188900 
   sim_148    sim_149    sim_150    sim_151    sim_152    sim_153    sim_154 
0.26587053 0.19469913 0.18952494 0.16205197 0.20827861 0.18443565 0.22789281 
   sim_155    sim_156    sim_157    sim_158    sim_159    sim_160    sim_161 
0.21633946 0.15984873 0.19983307 0.23347338 0.13385194 0.23670478 0.20621922 
   sim_162    sim_163    sim_164    sim_165    sim_166    sim_167    sim_168 
0.22929541 0.30900289 0.10087519 0.24966620 0.27227409 0.16113291 0.21781188 
   sim_169    sim_170    sim_171    sim_172    sim_173    sim_174    sim_175 
0.23765895 0.21142666 0.26245543 0.25959391 0.21549365 0.16671325 0.25381154 
   sim_176    sim_177    sim_178    sim_179    sim_180    sim_181    sim_182 
0.21392352 0.26040516 0.28008312 0.23572388 0.27682448 0.11624340 0.14670289 
   sim_183    sim_184    sim_185    sim_186    sim_187    sim_188    sim_189 
0.24295880 0.23476743 0.27453920 0.21444859 0.31198825 0.24145577 0.35317707 
   sim_190    sim_191    sim_192    sim_193    sim_194    sim_195    sim_196 
0.25413367 0.23798650 0.23942608 0.20427757 0.27464938 0.25864542 0.24163255 
   sim_197    sim_198    sim_199    sim_200    sim_201    sim_202    sim_203 
0.27686065 0.19089643 0.21983970 0.27632530 0.21548867 0.24881317 0.25721697 
   sim_204    sim_205    sim_206    sim_207    sim_208    sim_209    sim_210 
0.20341268 0.21208589 0.22963029 0.21717243 0.20983351 0.22659749 0.24710228 
   sim_211    sim_212    sim_213    sim_214    sim_215    sim_216    sim_217 
0.29994130 0.23261132 0.30098772 0.26901904 0.20887323 0.22100304 0.24235343 
   sim_218    sim_219    sim_220    sim_221    sim_222    sim_223    sim_224 
0.16288513 0.17214863 0.24508702 0.25717298 0.14507648 0.17157872 0.20748395 
   sim_225    sim_226    sim_227    sim_228    sim_229    sim_230    sim_231 
0.13431556 0.22452655 0.24964557 0.27380808 0.24566942 0.29362701 0.17785672 
   sim_232    sim_233    sim_234    sim_235    sim_236    sim_237    sim_238 
0.20060853 0.19003410 0.23388150 0.28524407 0.19126659 0.25699479 0.20741218 
   sim_239    sim_240    sim_241    sim_242    sim_243    sim_244    sim_245 
0.22638545 0.29631376 0.23025767 0.34432582 0.26426616 0.17785540 0.25853536 
   sim_246    sim_247    sim_248    sim_249    sim_250    sim_251    sim_252 
0.21264779 0.22904353 0.25121659 0.13842196 0.15110881 0.30317506 0.23869385 
   sim_253    sim_254    sim_255    sim_256    sim_257    sim_258    sim_259 
0.18849731 0.19478197 0.22173259 0.19248345 0.23357685 0.24545253 0.19248510 
   sim_260    sim_261    sim_262    sim_263    sim_264    sim_265    sim_266 
0.21917728 0.19907758 0.29012598 0.24042572 0.21514459 0.23828014 0.28537436 
   sim_267    sim_268    sim_269    sim_270    sim_271    sim_272    sim_273 
0.24413818 0.17589158 0.28209756 0.23210312 0.19844405 0.22157579 0.23924542 
   sim_274    sim_275    sim_276    sim_277    sim_278    sim_279    sim_280 
0.22207602 0.22341186 0.21939210 0.19810072 0.25474164 0.21757246 0.19621466 
   sim_281    sim_282    sim_283    sim_284    sim_285    sim_286    sim_287 
0.26559166 0.21209385 0.17546724 0.14466191 0.16446428 0.27100249 0.19108489 
   sim_288    sim_289    sim_290    sim_291    sim_292    sim_293    sim_294 
0.26155570 0.27818990 0.16776433 0.23408274 0.24122019 0.22390665 0.24080079 
   sim_295    sim_296    sim_297    sim_298    sim_299    sim_300    sim_301 
0.20885425 0.27332102 0.24748059 0.26529855 0.19199654 0.22189056 0.19596380 
   sim_302    sim_303    sim_304    sim_305    sim_306    sim_307    sim_308 
0.25064184 0.20818114 0.21491100 0.22810948 0.23091815 0.29680053 0.22555809 
   sim_309    sim_310    sim_311    sim_312    sim_313    sim_314    sim_315 
0.22331345 0.20423422 0.22914119 0.19851757 0.26899864 0.21846484 0.24054408 
   sim_316    sim_317    sim_318    sim_319    sim_320    sim_321    sim_322 
0.30065208 0.24726231 0.22003254 0.26261286 0.21739507 0.15070034 0.20874193 
   sim_323    sim_324    sim_325    sim_326    sim_327    sim_328    sim_329 
0.26560606 0.16918302 0.16567303 0.19757008 0.23688133 0.14528175 0.22532543 
   sim_330    sim_331    sim_332    sim_333    sim_334    sim_335    sim_336 
0.13139641 0.20515972 0.22472858 0.20259691 0.22442770 0.22068215 0.17894093 
   sim_337    sim_338    sim_339    sim_340    sim_341    sim_342    sim_343 
0.20160369 0.21989302 0.25586749 0.20622181 0.20739409 0.21374384 0.17076793 
   sim_344    sim_345    sim_346    sim_347    sim_348    sim_349    sim_350 
0.24904666 0.27743206 0.24762689 0.17258561 0.18862790 0.15727007 0.21727160 
   sim_351    sim_352    sim_353    sim_354    sim_355    sim_356    sim_357 
0.25785456 0.17746161 0.26079682 0.23520667 0.26398817 0.19528210 0.14731120 
   sim_358    sim_359    sim_360    sim_361    sim_362    sim_363    sim_364 
0.19143770 0.24486046 0.26935521 0.20023079 0.17103119 0.20445056 0.17705322 
   sim_365    sim_366    sim_367    sim_368    sim_369    sim_370    sim_371 
0.21452021 0.23641234 0.25558970 0.16003569 0.26981682 0.23191497 0.16657372 
   sim_372    sim_373    sim_374    sim_375    sim_376    sim_377    sim_378 
0.22000903 0.24068185 0.19371842 0.25963386 0.13563731 0.26118356 0.21428091 
   sim_379    sim_380    sim_381    sim_382    sim_383    sim_384    sim_385 
0.11532751 0.21317381 0.15629030 0.17713280 0.24320152 0.19903361 0.18015394 
   sim_386    sim_387    sim_388    sim_389    sim_390    sim_391    sim_392 
0.22555502 0.21171777 0.22056830 0.17644225 0.22345647 0.27083389 0.20695265 
   sim_393    sim_394    sim_395    sim_396    sim_397    sim_398    sim_399 
0.18168254 0.21807963 0.26497174 0.20103321 0.24719267 0.28774290 0.30268974 
   sim_400    sim_401    sim_402    sim_403    sim_404    sim_405    sim_406 
0.16431330 0.21958245 0.27865460 0.19940909 0.19913268 0.27102126 0.19773013 
   sim_407    sim_408    sim_409    sim_410    sim_411    sim_412    sim_413 
0.23082389 0.24329878 0.21339022 0.19817300 0.18541989 0.17701350 0.27445250 
   sim_414    sim_415    sim_416    sim_417    sim_418    sim_419    sim_420 
0.21966890 0.21519871 0.22163749 0.23605195 0.27479870 0.20210986 0.14122655 
   sim_421    sim_422    sim_423    sim_424    sim_425    sim_426    sim_427 
0.19479652 0.23285486 0.26832683 0.24656040 0.29343236 0.28934047 0.19753097 
   sim_428    sim_429    sim_430    sim_431    sim_432    sim_433    sim_434 
0.20869737 0.22660894 0.27398232 0.17475686 0.21483324 0.16846816 0.17320557 
   sim_435    sim_436    sim_437    sim_438    sim_439    sim_440    sim_441 
0.19953703 0.20586147 0.20303399 0.22479202 0.21878546 0.24281368 0.26590961 
   sim_442    sim_443    sim_444    sim_445    sim_446    sim_447    sim_448 
0.30895307 0.30217825 0.19211073 0.24595276 0.33664787 0.20819738 0.20526929 
   sim_449    sim_450    sim_451    sim_452    sim_453    sim_454    sim_455 
0.23333011 0.23596861 0.19447237 0.18282777 0.20336903 0.21844044 0.25491689 
   sim_456    sim_457    sim_458    sim_459    sim_460    sim_461    sim_462 
0.22683889 0.19801473 0.24432790 0.27238048 0.22162979 0.16552544 0.27343637 
   sim_463    sim_464    sim_465    sim_466    sim_467    sim_468    sim_469 
0.21741088 0.20764430 0.21471712 0.19813283 0.17393387 0.34508506 0.21691541 
   sim_470    sim_471    sim_472    sim_473    sim_474    sim_475    sim_476 
0.26756865 0.23328883 0.28471564 0.23781339 0.17928001 0.26349643 0.25802139 
   sim_477    sim_478    sim_479    sim_480    sim_481    sim_482    sim_483 
0.22645120 0.26527855 0.17329650 0.18604797 0.26000276 0.24633672 0.18728186 
   sim_484    sim_485    sim_486    sim_487    sim_488    sim_489    sim_490 
0.20948164 0.22759871 0.25615215 0.21814856 0.24199608 0.22670915 0.23087137 
   sim_491    sim_492    sim_493    sim_494    sim_495    sim_496    sim_497 
0.25525735 0.25929623 0.21851712 0.28481433 0.24099708 0.16125893 0.21779922 
   sim_498    sim_499    sim_500    sim_501    sim_502    sim_503    sim_504 
0.21304693 0.24122883 0.26040306 0.27777982 0.26498794 0.24331808 0.23892350 
   sim_505    sim_506    sim_507    sim_508    sim_509    sim_510    sim_511 
0.27878171 0.21380687 0.25317634 0.10330095 0.26811215 0.26310252 0.22483801 
   sim_512    sim_513    sim_514    sim_515    sim_516    sim_517    sim_518 
0.17975196 0.23155479 0.23366777 0.28598873 0.18467900 0.17993871 0.21452721 
   sim_519    sim_520    sim_521    sim_522    sim_523    sim_524    sim_525 
0.28482233 0.19433884 0.26659559 0.16523982 0.25576432 0.17870020 0.26536325 
   sim_526    sim_527    sim_528    sim_529    sim_530    sim_531    sim_532 
0.14581137 0.23184926 0.21550443 0.28450834 0.22184696 0.24396472 0.30390236 
   sim_533    sim_534    sim_535    sim_536    sim_537    sim_538    sim_539 
0.15762587 0.19824633 0.20710780 0.22725603 0.24453581 0.26463633 0.21590333 
   sim_540    sim_541    sim_542    sim_543    sim_544    sim_545    sim_546 
0.26327968 0.21520079 0.30082142 0.21459792 0.24601948 0.20916452 0.17090097 
   sim_547    sim_548    sim_549    sim_550    sim_551    sim_552    sim_553 
0.18993414 0.13774343 0.21504332 0.20139122 0.19759526 0.23624402 0.27942475 
   sim_554    sim_555    sim_556    sim_557    sim_558    sim_559    sim_560 
0.25009639 0.19143706 0.14401416 0.19111000 0.17739661 0.20495121 0.16323072 
   sim_561    sim_562    sim_563    sim_564    sim_565    sim_566    sim_567 
0.29985787 0.17436651 0.25767585 0.19194740 0.23954043 0.23083522 0.24897957 
   sim_568    sim_569    sim_570    sim_571    sim_572    sim_573    sim_574 
0.21172298 0.18617651 0.14977429 0.21345847 0.20471600 0.25514539 0.24725551 
   sim_575    sim_576    sim_577    sim_578    sim_579    sim_580    sim_581 
0.24633498 0.24572350 0.18235894 0.28153555 0.24394770 0.20166676 0.23123742 
   sim_582    sim_583    sim_584    sim_585    sim_586    sim_587    sim_588 
0.18614191 0.18763354 0.24340583 0.25717129 0.26890719 0.26417420 0.20332442 
   sim_589    sim_590    sim_591    sim_592    sim_593    sim_594    sim_595 
0.15908479 0.17310929 0.22956007 0.22347350 0.19914534 0.19078601 0.25660296 
   sim_596    sim_597    sim_598    sim_599    sim_600    sim_601    sim_602 
0.21407675 0.09915174 0.20853017 0.16912843 0.22880011 0.22125360 0.29626274 
   sim_603    sim_604    sim_605    sim_606    sim_607    sim_608    sim_609 
0.34627590 0.24042371 0.23598849 0.21364863 0.25178582 0.26466514 0.20340968 
   sim_610    sim_611    sim_612    sim_613    sim_614    sim_615    sim_616 
0.20718810 0.20312099 0.20955055 0.20579131 0.30534809 0.23400338 0.14119580 
   sim_617    sim_618    sim_619    sim_620    sim_621    sim_622    sim_623 
0.24001301 0.19416576 0.27605424 0.21270662 0.22408593 0.19887095 0.18872461 
   sim_624    sim_625    sim_626    sim_627    sim_628    sim_629    sim_630 
0.28856961 0.17731586 0.24610956 0.21508612 0.23138535 0.27266116 0.25326796 
   sim_631    sim_632    sim_633    sim_634    sim_635    sim_636    sim_637 
0.19368895 0.23848645 0.19694181 0.22747561 0.23195704 0.21375493 0.23343966 
   sim_638    sim_639    sim_640    sim_641    sim_642    sim_643    sim_644 
0.26289895 0.21476742 0.21044721 0.32136853 0.25351087 0.23344040 0.14275178 
   sim_645    sim_646    sim_647    sim_648    sim_649    sim_650    sim_651 
0.25545172 0.22564473 0.20204238 0.09793875 0.24404308 0.18242491 0.22690465 
   sim_652    sim_653    sim_654    sim_655    sim_656    sim_657    sim_658 
0.18221832 0.25120651 0.19594929 0.20048153 0.21845693 0.21342725 0.27367147 
   sim_659    sim_660    sim_661    sim_662    sim_663    sim_664    sim_665 
0.21774304 0.22257104 0.21618034 0.22802653 0.19258848 0.29718715 0.25359780 
   sim_666    sim_667    sim_668    sim_669    sim_670    sim_671    sim_672 
0.21830398 0.23424103 0.21124293 0.20629758 0.21543057 0.22531973 0.17900545 
   sim_673    sim_674    sim_675    sim_676    sim_677    sim_678    sim_679 
0.21625818 0.26931981 0.22664734 0.21019805 0.14136885 0.24152259 0.19071818 
   sim_680    sim_681    sim_682    sim_683    sim_684    sim_685    sim_686 
0.26179098 0.16692532 0.23625314 0.23968793 0.22021240 0.25200852 0.17596326 
   sim_687    sim_688    sim_689    sim_690    sim_691    sim_692    sim_693 
0.20484537 0.19239377 0.24334572 0.22563671 0.26688518 0.22122223 0.24208963 
   sim_694    sim_695    sim_696    sim_697    sim_698    sim_699    sim_700 
0.23189255 0.21099101 0.23446020 0.17467004 0.20444953 0.21503811 0.15836583 
   sim_701    sim_702    sim_703    sim_704    sim_705    sim_706    sim_707 
0.22202646 0.27684460 0.20842499 0.23129594 0.13007080 0.23808299 0.26921265 
   sim_708    sim_709    sim_710    sim_711    sim_712    sim_713    sim_714 
0.22804359 0.30200577 0.15065561 0.28217958 0.15066565 0.22411208 0.20938027 
   sim_715    sim_716    sim_717    sim_718    sim_719    sim_720    sim_721 
0.26157566 0.24536651 0.20350043 0.17335665 0.25177002 0.21664730 0.27401579 
   sim_722    sim_723    sim_724    sim_725    sim_726    sim_727    sim_728 
0.22134238 0.16542207 0.26983001 0.29283434 0.25513005 0.28573122 0.25084639 
   sim_729    sim_730    sim_731    sim_732    sim_733    sim_734    sim_735 
0.21729174 0.27583701 0.25328710 0.24974066 0.20584061 0.18716906 0.19104626 
   sim_736    sim_737    sim_738    sim_739    sim_740    sim_741    sim_742 
0.16355494 0.17318698 0.21762497 0.24951800 0.21290886 0.23618426 0.26690564 
   sim_743    sim_744    sim_745    sim_746    sim_747    sim_748    sim_749 
0.30439584 0.25938582 0.28719305 0.22759975 0.22957353 0.28081278 0.20682834 
   sim_750    sim_751    sim_752    sim_753    sim_754    sim_755    sim_756 
0.27227525 0.27685136 0.26287138 0.21982171 0.21650333 0.21169869 0.15706019 
   sim_757    sim_758    sim_759    sim_760    sim_761    sim_762    sim_763 
0.21709311 0.23165096 0.26378490 0.31008802 0.23267915 0.20135886 0.22987499 
   sim_764    sim_765    sim_766    sim_767    sim_768    sim_769    sim_770 
0.19699300 0.16331305 0.26375799 0.23995750 0.28225350 0.23035460 0.18533902 
   sim_771    sim_772    sim_773    sim_774    sim_775    sim_776    sim_777 
0.18348215 0.24496232 0.20858452 0.29651212 0.26750853 0.26369168 0.23111811 
   sim_778    sim_779    sim_780    sim_781    sim_782    sim_783    sim_784 
0.20383030 0.23283464 0.27131158 0.15209523 0.24652238 0.25951620 0.24224439 
   sim_785    sim_786    sim_787    sim_788    sim_789    sim_790    sim_791 
0.29837509 0.19247686 0.21155900 0.23897237 0.21016341 0.17283805 0.20849258 
   sim_792    sim_793    sim_794    sim_795    sim_796    sim_797    sim_798 
0.25410455 0.27077901 0.26970621 0.29389044 0.24614645 0.22436449 0.17536988 
   sim_799    sim_800    sim_801    sim_802    sim_803    sim_804    sim_805 
0.24005082 0.25970320 0.14766385 0.22005480 0.25219694 0.20583495 0.23409499 
   sim_806    sim_807    sim_808    sim_809    sim_810    sim_811    sim_812 
0.21502639 0.15655550 0.22878045 0.24030043 0.18067538 0.22386274 0.32769791 
   sim_813    sim_814    sim_815    sim_816    sim_817    sim_818    sim_819 
0.22840971 0.21119578 0.21016859 0.17062218 0.24899992 0.17536063 0.28078707 
   sim_820    sim_821    sim_822    sim_823    sim_824    sim_825    sim_826 
0.22169208 0.24140911 0.34506594 0.24599727 0.27699085 0.25973195 0.25544815 
   sim_827    sim_828    sim_829    sim_830    sim_831    sim_832    sim_833 
0.28915518 0.19164030 0.23357880 0.18241795 0.23163730 0.20892166 0.30018131 
   sim_834    sim_835    sim_836    sim_837    sim_838    sim_839    sim_840 
0.24330075 0.24264169 0.23152980 0.22427074 0.19229920 0.21930569 0.26368446 
   sim_841    sim_842    sim_843    sim_844    sim_845    sim_846    sim_847 
0.20060247 0.27739951 0.24946240 0.25068789 0.22626126 0.14178086 0.10495538 
   sim_848    sim_849    sim_850    sim_851    sim_852    sim_853    sim_854 
0.14379511 0.22742204 0.19547564 0.21994078 0.22190941 0.21718756 0.17799387 
   sim_855    sim_856    sim_857    sim_858    sim_859    sim_860    sim_861 
0.16475724 0.22015940 0.17947279 0.28219180 0.23552952 0.32031987 0.21352569 
   sim_862    sim_863    sim_864    sim_865    sim_866    sim_867    sim_868 
0.28327060 0.23721595 0.26968515 0.13448521 0.16160322 0.16436899 0.13267248 
   sim_869    sim_870    sim_871    sim_872    sim_873    sim_874    sim_875 
0.23564075 0.22633150 0.28150219 0.22450230 0.25839056 0.24714680 0.15226751 
   sim_876    sim_877    sim_878    sim_879    sim_880    sim_881    sim_882 
0.26105188 0.18665090 0.22480730 0.21834005 0.19224684 0.21537073 0.19395011 
   sim_883    sim_884    sim_885    sim_886    sim_887    sim_888    sim_889 
0.17682835 0.17778879 0.18443464 0.20600682 0.21731572 0.20395747 0.25753473 
   sim_890    sim_891    sim_892    sim_893    sim_894    sim_895    sim_896 
0.23349443 0.17256897 0.19725730 0.20186769 0.17405615 0.24108758 0.16371733 
   sim_897    sim_898    sim_899    sim_900    sim_901    sim_902    sim_903 
0.19881046 0.22734189 0.27716187 0.19742679 0.25451794 0.18030984 0.20511657 
   sim_904    sim_905    sim_906    sim_907    sim_908    sim_909    sim_910 
0.26372743 0.17316468 0.26519589 0.16176321 0.21560319 0.24343270 0.20037257 
   sim_911    sim_912    sim_913    sim_914    sim_915    sim_916    sim_917 
0.19116068 0.22947220 0.26134865 0.28525613 0.19037585 0.23803476 0.22847206 
   sim_918    sim_919    sim_920    sim_921    sim_922    sim_923    sim_924 
0.29180731 0.27471136 0.18720782 0.21310244 0.20876308 0.23699067 0.22396042 
   sim_925    sim_926    sim_927    sim_928    sim_929    sim_930    sim_931 
0.24318275 0.17675095 0.21027677 0.22940603 0.24136424 0.21592415 0.16011384 
   sim_932    sim_933    sim_934    sim_935    sim_936    sim_937    sim_938 
0.25709021 0.26619325 0.23756312 0.19059668 0.23456843 0.34184119 0.24466651 
   sim_939    sim_940    sim_941    sim_942    sim_943    sim_944    sim_945 
0.21379225 0.20806281 0.24567435 0.18974867 0.23123240 0.22695301 0.18988286 
   sim_946    sim_947    sim_948    sim_949    sim_950    sim_951    sim_952 
0.16857435 0.22102229 0.19992963 0.26721090 0.22765962 0.27487358 0.20501929 
   sim_953    sim_954    sim_955    sim_956    sim_957    sim_958    sim_959 
0.19763926 0.26313933 0.25127792 0.22640643 0.16347985 0.20030193 0.22630053 
   sim_960    sim_961    sim_962    sim_963    sim_964    sim_965    sim_966 
0.16516114 0.19058421 0.16604114 0.31122288 0.19969792 0.20354809 0.16753966 
   sim_967    sim_968    sim_969    sim_970    sim_971    sim_972    sim_973 
0.24871500 0.21307872 0.22291048 0.23891831 0.27522200 0.22219604 0.18832574 
   sim_974    sim_975    sim_976    sim_977    sim_978    sim_979    sim_980 
0.23179196 0.23873620 0.22065212 0.22592045 0.23937310 0.20465849 0.17676614 
   sim_981    sim_982    sim_983    sim_984    sim_985    sim_986    sim_987 
0.20378318 0.18432447 0.22265830 0.15938733 0.23005826 0.18847937 0.24611926 
   sim_988    sim_989    sim_990    sim_991    sim_992    sim_993    sim_994 
0.20030880 0.30898505 0.14621784 0.28304340 0.22044399 0.16953559 0.19401399 
   sim_995    sim_996    sim_997    sim_998    sim_999   sim_1000 
0.23369440 0.33003064 0.16894296 0.22895899 0.33309221 0.24296935 
Code
# Creating a histogram
tibble(rsquared = reg_sim) |> 
  ggplot(aes(x = rsquared)) +
  geom_histogram(bins = 30,
                 color = "darkorange",
                 fill = "orange")